Practice in Synonym Extraction at Large Scale

نویسندگان

  • Liangliang Cao
  • Chang Wang
چکیده

Synonym extraction is an important task in natural language processing and often used as a submodule in query expansion, question answering and other applications. Automatic synonym extractor is highly preferred for large scale applications. Previous studies in synonym extraction are most limited to small scale datasets. In this paper, we build a large dataset with 3.4 million synonym/nonsynonym pairs to capture the challenges in real world scenarios. We proposed (1) a new cost function to accommodate the unbalanced learning problem, and (2) a feature learning based deep neural network to model the complicated relationships in synonym pairs. We compare several different approaches based on SVMs and neural networks, and find out a novel feature learning based neural network outperforms the methods with hand-assigned features. Specifically, the best performance of our model surpasses the SVM baseline with a significant 97% relative improvement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Turning Distributional Thesauri into Word Vectors for Synonym Extraction and Expansion

In this article, we propose to investigate a new problem consisting in turning a distributional thesaurus into dense word vectors. We propose more precisely a method for performing such task by associating graph embedding and distributed representation adaptation. We have applied and evaluated it for English nouns at a large scale about its ability to retrieve synonyms. In this context, we have...

متن کامل

Optimizing Synonym Extraction Using Monolingual and Bilingual Resources

Automatically acquiring synonymous words (synonyms) from corpora is a challenging task. For this task, methods that use only one kind of resources are inadequate because of low precision or low recall. To improve the performance of synonym extraction, we propose a method to extract synonyms with multiple resources including a monolingual dictionary, a bilingual corpus, and a large monolingual c...

متن کامل

Evaluation of Updating Methods in Building Blocks Dataset

With the increasing use of spatial data in daily life, the production of this data from diverse information sources with different precision and scales has grown widely. Generating new data requires a great deal of time and money. Therefore, one solution is to reduce costs is to update the old data at different scales using new data (produced on a similar scale). One approach to updating data i...

متن کامل

The Combination Process for Preparative Separation and Purification of Paclitaxel and 10-Deacetylbaccatin III Using Diaion® Hp-20 Followed by Hydrophilic Interaction Based Solid Phase Extraction

There is no other naturally occurring defense agent against cancer that has a stronger effectthan paclitaxel, commonly known under the brand name of Taxol®. The major drawback for themore widespread use of paclitaxel and its precious precursor, 10-deacetylbaccatin III (10-DABIII), is that they require large-scale extraction from different parts of yew trees (Taxus species),cell cultures, taxane...

متن کامل

Medical Synonym Extraction with Concept Space Models

In this paper, we present a novel approach for medical synonym extraction. We aim to integrate the term embedding with the medical domain knowledge for healthcare applications. One advantage of our method is that it is very scalable. Experiments on a dataset with more than 1M term pairs show that the proposed approach outperforms the baseline approaches by a large margin.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1412.2197  شماره 

صفحات  -

تاریخ انتشار 2014